Search CORE

306 research outputs found

RePOR: Mimicking humans on refactoring tasks. Are we there yet?

Author: Antoniol Giuliano
Khomh Foutse
Morales Rodrigo
Publication venue
Publication date: 17/05/2019
Field of study

Refactoring is a maintenance activity that aims to improve design quality while preserving the behavior of a system. Several (semi)automated approaches have been proposed to support developers in this maintenance activity, based on the correction of anti-patterns, which are `poor' solutions to recurring design problems. However, little quantitative evidence exists about the impact of automatically refactored code on program comprehension, and in which context automated refactoring can be as effective as manual refactoring. Leveraging RePOR, an automated refactoring approach based on partial order reduction techniques, we performed an empirical study to investigate whether automated refactoring code structure affects the understandability of systems during comprehension tasks. (1) We surveyed 80 developers, asking them to identify from a set of 20 refactoring changes if they were generated by developers or by a tool, and to rate the refactoring changes according to their design quality; (2) we asked 30 developers to complete code comprehension tasks on 10 systems that were refactored by either a freelancer or an automated refactoring tool. To make comparison fair, for a subset of refactoring actions that introduce new code entities, only synthetic identifiers were presented to practitioners. We measured developers' performance using the NASA task load index for their effort, the time that they spent performing the tasks, and their percentages of correct answers. Our findings, despite current technology limitations, show that it is reasonable to expect a refactoring tools to match developer code

arXiv.org e-Print Archive

PolyPublie

Stack Overflow: A Code Laundering Platform?

Author: An Le
Antoniol Giuliano
Khomh Foutse
Mlouki Ons
Publication venue
Publication date: 01/01/2017
Field of study

Developers use Question and Answer (Q&A) websites to exchange knowledge and expertise. Stack Overflow is a popular Q&A website where developers discuss coding problems and share code examples. Although all Stack Overflow posts are free to access, code examples on Stack Overflow are governed by the Creative Commons Attribute-ShareAlike 3.0 Unported license that developers should obey when reusing code from Stack Overflow or posting code to Stack Overflow. In this paper, we conduct a case study with 399 Android apps, to investigate whether developers respect license terms when reusing code from Stack Overflow posts (and the other way around). We found 232 code snippets in 62 Android apps from our dataset that were potentially reused from Stack Overflow, and 1,226 Stack Overflow posts containing code examples that are clones of code released in 68 Android apps, suggesting that developers may have copied the code of these apps to answer Stack Overflow questions. We investigated the licenses of these pieces of code and observed 1,279 cases of potential license violations (related to code posting to Stack overflow or code reuse from Stack overflow). This paper aims to raise the awareness of the software engineering community about potential unethical code reuse activities taking place on Q&A websites like Stack Overflow.Comment: In proceedings of the 24th IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER

arXiv.org e-Print Archive

Crossref

PolyPublie

Comprehension of Ads-supported and Paid Android Applications: Are They Different?

Author: Antoniol Giuliano
Guéhéneuc Yann-Gaël
Khomh Foutse
Saborido Rubén
Publication venue
Publication date: 01/01/2017
Field of study

The Android market is a place where developers offer paid and-or free apps to users. Free apps are interesting to users because they can try them immediately without incurring a monetary cost. However, free apps often have limited features and-or contain ads when compared to their paid counterparts. Thus, users may eventually need to pay to get additional features and-or remove ads. While paid apps have clear market values, their ads-supported versions are not entirely free because ads have an impact on performance. In this paper, first, we perform an exploratory study about ads-supported and paid apps to understand their differences in terms of implementation and development process. We analyze 40 Android apps and we observe that (i) ads-supported apps are preferred by users although paid apps have a better rating, (ii) developers do not usually offer a paid app without a corresponding free version, (iii) ads-supported apps usually have more releases and are released more often than their corresponding paid versions, (iv) there is no a clear strategy about the way developers set prices of paid apps, (v) paid apps do not usually include more functionalities than their corresponding ads-supported versions, (vi) developers do not always remove ad networks in paid versions of their ads-supported apps, and (vii) paid apps require less permissions than ads-supported apps. Second, we carry out an experimental study to compare the performance of ads-supported and paid apps and we propose four equations to estimate the cost of ads-supported apps. We obtain that (i) ads-supported apps use more resources than their corresponding paid versions with statistically significant differences and (ii) paid apps could be considered a most cost-effective choice for users because their cost can be amortized in a short period of time, depending on their usage.Comment: Accepted for publication in the proceedings of the IEEE International Conference on Program Comprehension 201

arXiv.org e-Print Archive

Crossref

PolyPublie

Grand Challenges of Traceability: The Next Ten Years

Author: Antoniol Giuliano
Cleland-Huang Jane
Hayes Jane Huffman
Vierhauser Michael
Publication venue
Publication date: 01/01/2017
Field of study

In 2007, the software and systems traceability community met at the first Natural Bridge symposium on the Grand Challenges of Traceability to establish and address research goals for achieving effective, trustworthy, and ubiquitous traceability. Ten years later, in 2017, the community came together to evaluate a decade of progress towards achieving these goals. These proceedings document some of that progress. They include a series of short position papers, representing current work in the community organized across four process axes of traceability practice. The sessions covered topics from Trace Strategizing, Trace Link Creation and Evolution, Trace Link Usage, real-world applications of Traceability, and Traceability Datasets and benchmarks. Two breakout groups focused on the importance of creating and sharing traceability datasets within the research community, and discussed challenges related to the adoption of tracing techniques in industrial practice. Members of the research community are engaged in many active, ongoing, and impactful research projects. Our hope is that ten years from now we will be able to look back at a productive decade of research and claim that we have achieved the overarching Grand Challenge of Traceability, which seeks for traceability to be always present, built into the engineering process, and for it to have "effectively disappeared without a trace". We hope that others will see the potential that traceability has for empowering software and systems engineers to develop higher-quality products at increasing levels of complexity and scale, and that they will join the active community of Software and Systems traceability researchers as we move forward into the next decade of research

arXiv.org e-Print Archive

PolyPublie

Documentation of Machine Learning Software

Author: Antoniol Giuliano
Hashemi Yalda
Nayebi Maleknaz
Publication venue
Publication date: 01/01/2020
Field of study

Machine Learning software documentation is different from most of the documentations that were studied in software engineering research. Often, the users of these documentations are not software experts. The increasing interest in using data science and in particular, machine learning in different fields attracted scientists and engineers with various levels of knowledge about programming and software engineering. Our ultimate goal is automated generation and adaptation of machine learning software documents for users with different levels of expertise. We are interested in understanding the nature and triggers of the problems and the impact of the users' levels of expertise in the process of documentation evolution. We will investigate the Stack Overflow Q/As and classify the documentation related Q/As within the machine learning domain to understand the types and triggers of the problems as well as the potential change requests to the documentation. We intend to use the results for building on top of the state of the art techniques for automatic documentation generation and extending on the adoption, summarization, and explanation of software functionalities.Comment: The paper is accepted for publication in 27th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2020

arXiv.org e-Print Archive

Crossref

PolyPublie

Comparison and Evaluation of Clone Detection Tools

Author: Ettore Merlo
Giuliano Antoniol
Jens Krinke
Rainer Koschke
Stefan Bellon
Publication venue
Publication date: 01/01/2007
Field of study

Many techniques for detecting duplicated source code (software clones) have been proposed in the past. However, it is not yet clear how these techniques compare in terms of recall and precision as well as space and time requirements. This paper presents an experiment that evaluates six clone detectors based on eight large C and Java programs (altogether almost 850 KLOC). Their clone candidates were evaluated by one of the authors as an independent third party. The selected techniques cover the whole spectrum of the state-of-the-art in clone detection. The techniques work on text, lexical and syntactic information, software metrics, and program dependency graphs

CiteSeerX

UCL Discovery

PolyPublie

Insider threat resistant SQL-injection prevention in PHP

Author: Antoniol Giuliano
Letarte Dominic
Merlo Ettore
Publication venue
Publication date: 01/05/2006
Field of study

Web sites are either static sites, programs, or databases. Very often they are a mixture of these three aspects integrating relational databases as a back-end. Web sites require configuration and programming attention to assure security, confidentiality, and trustiness of the published information. SQL-injection attacks rely on some weak validation of textual input used to build database queries. Maliciously crafted input may threaten the confidentiality and the security policies of Web sites relying on a database to store and retrieve information. Furthermore, insiders may introduce malicious code in a Web application, code that, when triggered by some specific input, for example, would violate security policies. This paper presents an original approach that combines static analysis, dynamic analysis, and code reengineering to automatically protect applications written in PHP from both malicious input (outsider threats) and malicious code (insider threats) that carry SQLinjection attacks. The paper also reports preliminary results about experiments performed on an old SQL-injection prone version of phpBB (version 2.0.0, 37193 LOC of PHP version 4.2.2 code). Results show that our approach successfully improved phpBB-2.0.0 resistance to SQLinjection attacks

PolyPublie

A Google-inspired error-correcting graph matching algorithm

Author: Antoniol Giuliano
Galinier Philippe
Kpodjedo Segla
Publication venue
Publication date: 01/07/2008
Field of study

Graphs and graph algorithms are applied in many different areas including civil engineering, telecommunications, bio-informatics and software engineering. While exact graph matching is grounded on a consolidated theory and has well known results, approximate graph matching is still an open research subject. This paper presents an error tolerant approximated graph matching algorithm based on tabu search using the Google-like PageRank algorithm. We report preliminary results obtained on 2 graph data benchmarks. The first one is the TC-15 database [14], a graph data base at the University of Naples, Italy. These graphs are limited to exact matching. The second one is a novel data set of large graphs generated by randomly mutating TC-15 graphs in order to evaluate the performance of our algorithm. Such a mutation approach allows us to gain insight not only about time but also about matching accuracy

PolyPublie

A feedback based quality assessment to support open source software evolution: the GRASS case study

Author: Ettore Merlo
Giuliano Antoniol
Markus Neteler
Salah Bouktif
Publication venue
Publication date: 01/01/2006
Field of study

Abstrac

CiteSeerX

PolyPublie

The Effect of Communication Overhead on Software Maintenance Project Staffing: a Search-Based Approach

Author: Fahim Qureshi
Giuliano Antoniol
Mark Harman
Massimiliano Di Penta
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Brooks ’ milestone ‘Mythical Man Month ’ established the observation that there is no simple conversion between peo-ple and time in large scale software projects. Communica-tion and training overheads yield a subtle and variable re-lationship between the person-months required for a project and the number of people needed to complete the task within a given time frame. This paper formalises several instantiations of Brooks’ law and uses these to construct project schedule and staffing instances — using a search-based project staffing and scheduling approach — on data from two large real world maintenance projects. The results reveal the impact of dif-ferent formulations of Brooks ’ law on project completion time and on staff distribution across teams, and the influ-ence of other factors such as the presence of dependen-cies between work packages on the effect of communication overhead

CiteSeerX

Crossref

PolyPublie